{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<i>Copyright (c) Recommenders contributors.</i>\n",
                "\n",
                "<i>Licensed under the MIT License.</i>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Geometry Aware Inductive Matrix Completion (GeoIMC)\n",
                "\n",
                "GeoIMC is an inductive matrix completion algorithm based on the works by Jawanpuria et al. (2019)\n",
                "\n",
                "Consider the case of MovieLens-100K (ML100K), Let $X \\in R^{m \\times d_1}, Z \\in R^{n \\times d_2} $ be the features of users and movies respectively. Let $M \\in R^{m \\times n}$, be the partially observed ratings matrix. GeoIMC models this matrix as $M = XUBV^TZ^T$, where $U \\in R^{d_1 \\times k}, V \\in R^{d_2 \\times k}, B \\in R^{k \\times k}$ are Orthogonal, Orthogonal, Symmetric Positive-Definite matrices respectively. This Optimization problem is solved by using Pymanopt.\n",
                "\n",
                "\n",
                "This notebook provides an example of how to utilize and evaluate GeoIMC implementation in **recommenders**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {},
            "outputs": [],
            "source": [
                "import tempfile\n",
                "import zipfile\n",
                "import pandas as pd\n",
                "import numpy as np\n",
                "\n",
                "from recommenders.datasets import movielens\n",
                "from recommenders.models.geoimc.geoimc_data import ML_100K\n",
                "from recommenders.models.geoimc.geoimc_algorithm import IMCProblem\n",
                "from recommenders.models.geoimc.geoimc_predict import Inferer\n",
                "from recommenders.evaluation.python_evaluation import rmse, mae\n",
                "from recommenders.utils.notebook_utils import store_metadata"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Choose the MovieLens dataset\n",
                "MOVIELENS_DATA_SIZE = '100k'\n",
                "# Normalize user, item features\n",
                "normalize = True\n",
                "# Rank (k) of the model\n",
                "rank = 300\n",
                "# Regularization parameter\n",
                "regularizer = 1e-3\n",
                "\n",
                "# Parameters for algorithm convergence\n",
                "max_iters = 150000\n",
                "max_time = 1000\n",
                "verbosity = 1"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Download ML100K dataset and features"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {},
            "outputs": [
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "100%|██████████| 4.81k/4.81k [00:09<00:00, 519KB/s]\n"
                    ]
                }
            ],
            "source": [
                "# Create a directory to download ML100K\n",
                "dp = tempfile.mkdtemp(suffix='-geoimc')\n",
                "movielens.download_movielens(MOVIELENS_DATA_SIZE, f\"{dp}/ml-100k.zip\")\n",
                "with zipfile.ZipFile(f\"{dp}/ml-100k.zip\", 'r') as z:\n",
                "    z.extractall(dp)\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Load the dataset using the example features provided in helpers\n",
                "\n",
                "The features were generated using the same method as the work by Xin Dong et al. (2017)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset = ML_100K(\n",
                "    normalize=normalize,\n",
                "    target_transform='binarize'\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset.load_data(f\"{dp}/ml-100k/\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Characteristics:\n",
                        "\n",
                        "              target: (943, 1682)\n",
                        "              entities: (943, 1822), (1682, 1925)\n",
                        "\n",
                        "              training: (80000,)\n",
                        "              training_entities: (943, 1822), (1682, 1925)\n",
                        "\n",
                        "              testing: (20000,)\n",
                        "              test_entities: (943, 1822), (1682, 1925)\n",
                        "\n"
                    ]
                }
            ],
            "source": [
                "print(f\"\"\"Characteristics:\n",
                "\n",
                "              target: {dataset.training_data.data.shape}\n",
                "              entities: {dataset.entities[0].shape}, {dataset.entities[1].shape}\n",
                "\n",
                "              training: {dataset.training_data.get_data().data.shape}\n",
                "              training_entities: {dataset.training_data.get_entity(\"row\").shape}, {dataset.training_data.get_entity(\"col\").shape}\n",
                "\n",
                "              testing: {dataset.test_data.get_data().data.shape}\n",
                "              test_entities: {dataset.test_data.get_entity(\"row\").shape}, {dataset.test_data.get_entity(\"col\").shape}\n",
                "\"\"\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. Initialize the IMC problem"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {},
            "outputs": [],
            "source": [
                "np.random.seed(10)\n",
                "prblm = IMCProblem(\n",
                "    dataset.training_data,\n",
                "    lambda1=regularizer,\n",
                "    rank=rank\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Optimizing...\n",
                        "Terminated - max time reached after 1753 iterations.\n",
                        "\n"
                    ]
                }
            ],
            "source": [
                "# Solve the Optimization problem\n",
                "prblm.solve(\n",
                "    max_time,\n",
                "    max_iters,\n",
                "    verbosity\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Initialize an inferer\n",
                "inferer = Inferer(\n",
                "    method='dot'\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Predict using the parametrized matrices\n",
                "predictions = inferer.infer(\n",
                "    dataset.test_data,\n",
                "    prblm.W\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Prepare the test, predicted dataframes\n",
                "user_ids = dataset.test_data.get_data().tocoo().row\n",
                "item_ids = dataset.test_data.get_data().tocoo().col\n",
                "test_df = pd.DataFrame(\n",
                "    data={\n",
                "        \"userID\": user_ids,\n",
                "        \"itemID\": item_ids,\n",
                "        \"rating\": dataset.test_data.get_data().data\n",
                "    }\n",
                ")\n",
                "predictions_df = pd.DataFrame(\n",
                "    data={\n",
                "        \"userID\": user_ids,\n",
                "        \"itemID\": item_ids,\n",
                "        \"prediction\": [predictions[uid, iid] for uid, iid in list(zip(user_ids, item_ids))]\n",
                "    }\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "\n",
                        "RMSE: 0.496351244012414\n",
                        "MAE: 0.47524594431584\n",
                        "\n"
                    ]
                }
            ],
            "source": [
                "# Calculate RMSE\n",
                "RMSE = rmse(\n",
                "    test_df,\n",
                "    predictions_df\n",
                ")\n",
                "# Calculate MAE\n",
                "MAE = mae(\n",
                "    test_df,\n",
                "    predictions_df\n",
                ")\n",
                "print(f\"\"\"\n",
                "RMSE: {RMSE}\n",
                "MAE: {MAE}\n",
                "\"\"\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Record results for tests - ignore this cell\n",
                "store_metadata(\"rmse\", RMSE)\n",
                "store_metadata(\"mae\", MAE)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## References\n",
                "\n",
                "[1] Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra. _[Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach](https://www.mitpressjournals.org/doi/full/10.1162/tacl_a_00257)_. Transaction of the Association for Computational Linguistics (TACL), Volume 7, p.107-120, 2019.\n",
                "\n",
                "[2] Xin Dong, Lei Yu, Zhonghuo Wu, Yuxia Sun, Lingfeng Yuan, Fangxi Zhang. [A Hybrid Collaborative Filtering Model withDeep Structure for Recommender Systems](https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14676/13916).\n",
                "Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), p.1309-1315, 2017."
            ]
        }
    ],
    "metadata": {
        "celltoolbar": "Tags",
        "kernelspec": {
            "display_name": "Python (reco)",
            "language": "python",
            "name": "reco_base"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.6.10"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}
